Text Mining with Hybrid Clustering Schemes
نویسندگان
چکیده
Hybrid information retrieval (IR) schemes combine di erent normalization techniques and similarity functions. Hybrid schemes provide an eÆcient technique to improve precision and recall (see e.g., [4]). This paper reports a hybrid clustering scheme that applies a singular value decomposition (SVD) based algorithm followed by a k{means type clustering algorithm. The output of the rst algorithm becomes the input of the next one. The second algorithm generates the nal partition of the data set. We report results of numerical experiments performed with three k{means type clustering algorithms. Those are: the classical k{means (see e.g., [9]), the spherical k{means (see [7]), and the information{ theoretical clustering algorithm introduced recently by [8], and [1]. A comparison with the results reported by [7] is provided.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملHybrid Clustering of Text Mining and Bibliometrics Applied to Journal Sets
To obtain correlated and complementary information contained in text mining and bibliometrics, hybrid clustering to incorporate textual content and citation information has become a popular strategy. In this paper, we propose a new computational framework of integrating text mining and bibliometrics to provide a mapping of journal sets. Two different approaches of hybrid clustering methods are ...
متن کاملImplementation of Hybrid Clustering Algorithm with Enhanced K-Means and Hierarchal Clustering
We are propose a hybrid clustering method, the methodology combines the strengths of both partitioning and agglomerative clustering methods. Clustering algorithms that build meaningful hierarchies out of large document collections are ideal tools for their interactive visualization and exploration as they provide data-views that are consistent, predictable, and at different levels of granularit...
متن کاملDomain Based Punjabi Text Document Clustering
Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure & separating the dissimilar documents. Popular clustering algorithms available for text clustering treats document as conglomeration of words. The syntactic or semantic relations between words are not given any consideration. Many different algorithms ...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کامل